VAST Challenge 2021: Mini Challenge 1

Lim Jin Ru (Alethea) true
07-23-2021

Overview

This assignment is based on the mini-challenge in VAST Challenge 2021. The selected challenge topic for this assignment is Mini-Challenge 1.

In the roughly twenty years that Tethys-based GAStech has been operating a natural gas production site in the island country of Kronos, it has produced remarkable profits and developed strong relationships with the government of Kronos. However, GAStech has not been as successful in demonstrating environmental stewardship.

In January 2014, the leaders of GAStech are celebrating their new-found fortune as a result of the initial public offering of their very successful company. In the midst of this celebration, several employees of GAStech go missing. An organization known as the Protectors of Kronos (POK) is suspected in the disappearance, but things may not be what they seem.

The Protectors of Kronos (POK) is a political activist movement stemmed from concerns about contamination from drilling at the Tiskele Bend gas fields. An international agency has tested water from the Tiskele River both upstream and downstream of the Tiskele Bend gas fields and confirmed that the presence of contaminants is consistent with pollution from Hyper Acidic Substrate Removal, a gas drilling technique employed by GAStech at the Tiskele Bend fields and these test results have also been published in several international journals.

My role is to use visual analytics to help law enforcement from Kronos and Tethys discover the relationships among the people and organizations.

Background

Mini-Challenge 1 looks at the relationships and conditions that led up to the kidnapping. As an analyst, I’d be analysing a set of current and historical news reports, resumes of numerous GAStech employees and email headers from two weeks of internal GAStech company email to identify the complex relationships among all of these people and organizations.

Literature Review, Objective and Motivation

In the literature review conducted of the 2014 VAST Challenge submissions covering the same crime case but with different questions, it was observed that many of the visualizations were quite informative and had their own strengths but there were some limitations or areas to improve as well.

Some areas that can be further improved on:

The proposed visualizations will attempt to overcome some of these limitations.

The participants used a variety of tools for their visualizations. For this assignment, the approach will be to utilize R programming solely as there are numerous visualization packages available in the R environment with useful functions and great adaptability. There are also many new packages constantly being pushed out in the R community. For this assignment, some of the newer R packages will also be explored and applied, including corporaexplorer published in 2021 and LDAvis published in 2015.

Data Preparation

Data extraction, wrangling and data preparation were performed with R, primarily with tidyverse methods.

Answers

Qn 1: Characterize the news data sources provided. Which are primary sources and which are derivative sources? What are the relationships between the primary and derivative sources?

To determine whether a news article is a primary or derivative source, a few visualisations would be used.

There are many characteristics that differentiate a primary and derivative source. A derivative source is any record that relies on other records for its information.

As derivative sources are second-hand account of events, they will cover information from the primary source, often providing additional analysis and interpretation. A correlation analysis of documents published by the news companies can help give us clues in the primary and derivative source relationships.

First, the correlation is set very low to allow us to explore the overall correlation between all the news sources of the different companies.

Tethys News is an outlier and needed a closer inspection for analysis.

A corpus exploration tool created from ‘corporaexplorer’ package in R was used. Tethys News’ articles have an update timestamp instead of just a publication date which suggests that their reporting are quite recent/close to the event. News that are written or made during or close to the time of the event is a characteristic of a primary source. Hence, Tethys News is likely to be a primary source.

A high correlation of the news companies suggests that that the articles between the two companies are highly similar and that they may either both be derivative sources as they analyse and interpret each other’s work or that one is a primary source and the other is a derivative source as the latter aims to analyse and interpret the primary source, hence referencing a substantial amount of content from the primary source to facilitate analysis/interpretation. A correlation of 0.70 is selected to evaluate the relationships.

A number of clusters were formed with Centrum Sentinel and Modern Rubicon being quite far from the group, suggesting that they have higher likelihood of being of a primary source.

For the others with high correlation with each other, they are likely to be derivative sources.

For instance, Athena Speaks and Central Bulletin have a correlation of more than 0.8. This suggests that their news articles are highly similar.

The corpus exploration tool is used to examine the articles in two news companies. To allow a focus on a common topic, the corpus filter setting is set to include “kidnap—1”. This phrase will prompt the corporaexplorer to keep only articles with the word “kidnap” appearing at least once.

A comparison between Athena Speaks – 140 article and Central Bulletin – 673 article shows that their articles’ content are almost identical with just a paraphrasing of the text, suggesting that they are likely both derivative sources of another source or a record of each other.

Athena Speaks - Article 140 Central Bulletin - Article 673

The corpus exploration tool created from the R package can also allow us to have a quick overview of the primary and derivative sources with its key term highlight function.

Primary sources include original document information such as interviews. Hence, the term “say” can be used to locate the primary source article as they represent first-hand accounts. This is placed in the red-colored term to chart and highlight. The name of others news companies can be used to locate the derivative source articles. This is placed in the blue-colored term to chart and highlight.

In the above created visualization, one can now quickly differentiate the primary and derivative sources by referring to the red-colored tiles for the primary source and blue-colored tiles for the derivative sources. For example, World Journal - Article 396 was flagged as a derivative source with its blue tile. After clicking on the tile, the document information pops up on the right and indeed, there was a reference made to “corresponding times in Abila” suggesting that World Journal is indeed a derivative source.

Based on the above visualization, we can also infer that Centrum Sentinel and Homeland Illumination have primary source articles with the red document tiles.

It is helpful to understand which news source are primary sources and derivative sources as the law enforcers can derive different types of value from this two group of sources. For example, primary source are especially useful to get the latest accurate updates and to ensure minimum “embellishment” that can bring about confusion to the case. Derivative sources are especially useful for understand history of related personnel or uncover certain goals of suspects since additional research, compilation and analysis has gone into derivative sources.

Qn 2: Characterize any biases you identify in these news sources, with respect to their representation of specific people, places, and events. Give examples.

The news sources have different biases and their bias leads them to look at the potential culprits of the crime from different perspectives, coming to different hypotheses. While bias is typically negative, in this case, it can help us examine the different possible motivations of suspects and consider the scenarios to investigate.

The focus of examination is targeted in the year 2014 as the kidnapping occurred in Jan of 2014. The data is filtered to only include news sources in 2014.

news_articles <- dplyr::filter(news_articles , grepl('2014', Text))

Before diving into the biases, below is a word cloud to provide an overview of related people, places and events in 2014, the year of the kidnapping.

Topic modelling of the news sources were performed to allow us to have a good overview of the bias via the topics.

The summary of the topics are as below:

Topic 1 Topic 2 Topic 3 Topic 4
Missing GAStech employees jumped the city with their newfound wealth from the IPO
The kidnappers are linked to POK and APA.

The kidnappers are linked to increasingly “anarchist” POK.
External people dressed in black were the suspects. There were comments that they were “lurking approximately” when the fire alarm sounded off. They are also suppliers for the breakfast meeting in the morning of the kidnapping

The heatmap diagram below shows the topic and corresponding leaning bias that each newsgroup fall into.

Topic 1: The bias is positive towards the government as the keywords stone unturned here refer to the data sources justifying that the police force or government’s mistake in wrongly detaining Edvard Vann due to a confusion of identity linked to his family name.

The key word Danisliau refers to the fueler Ravi Danislau who shaerd that he saw two private jets leaving the airport of Abila today carrying “business types” people.

Overall, data sources in this topic seems to indicate a bias against GAStech executives escaping on their private jets with the wealth from their IPO.

Topic 2: The bias is against APA. Data sources falling in this category have a number of references to APA and their related activities. A suggestion is that the kidnapping could be due to APA, Army of People of Asterian along with POK.

Topic 3: Data sources falling in this category view the Kronos government and GAStech executives rather negatively as there is the key word kleptocracy suggesting corruption between the Kronos public officials and GAStech executives. They do believe that a kidnapping may have occurred as POK has become increasingly anarchist, indicating that they may have been more supportive of POK in the past before it started to get more radicalized.

Topic 4: Data sources falling in this category are not biased against POK, APA or the GAStech executives and hence lean towards GAStech employees’ testimonials instead. Based on their testimonials of seeing unknown people dressed in black, they become this group of data sources’ main suspect. The coordinator of GAStech informed that these people were the suppliers/caterers for the morning’s breakfast on the day of the kidnapping and there were some suggestions to investigate these people.

Qn 3: Given the data sources provided, use visual analytics to identify potential official and unofficial relationships among GASTech, POK, the APA, and Government. Include both personal relationships and shared goals and objectives. Provide evidence for these relationships.

Personal relationships among GAStech employees

Non-work related emails sent between colleagues at GAStech can help us uncover personal relationships among the GAStech employees as the frequency of such emails suggest a closer personal relationship.

There are two pairs of relationships in the above diagram that are particularly striking.

One is between Rachel Pantanal (id: 38), Assistant to CIO and Isia Vann (id: 43), Perimeter Control. They display a close relationship as we can see that they exchange non-work related emails with each other on majority of the days in the week (Monday, Tuesday, Wednesday, Friday). On closer inspection of their emails, there was an email about whether Rachel liked the flowers and she responded positively. This suggests a potential romantic relationship between the both of them. Isia Vann has a personal grudge against GAStech for the death of his sister and is part of the more radical-minded members at POK according to the historical documents. If he is in a romantic relationship with Rachel, Rachel may be sympathetic to his causes.

Another pair is Rachel Pantanal (id: 38) and Ruscella Mies Haber (id: 33) as they shared a non-work related email exchange on Sunday. It is very unusual for employees at GAStech to send non-work related emails to each other on weekends so this is an outlier. Upon inspection of the data, it was revealed that the exchanged email was titled “RE: FW: ARISE - Inspiration for Defenders of Kronos”.

ARISE is a publication by the Asterian People’s Army APA , a paramilitary organization which has been engaged in terrorist activities funded through its criminal enterprises which include drug trafficking and has been associated with POK. Please refer to the data table diagram below.

The fact that the email was sent on Sunday might also be due to some level of urgency prior to the kidnapping.

Exchange of the suspicious “ARISE - Inspiration for Defenders of Kronos” email

The suspicious “ARISE” email was exchanged among the above group of colleagues. It was first sent from Rachel to Ruscella and then there was an exchange of information among Hennie, Ruscella, Loreto, Isia, Inga and Minke, suggesting an ongoing conversation about this email among the group.

As Rachel is from the administration department, investigators may be able to find out more clues from other members of the administration department as there is frequent interaction between Rachel and her colleagues in the same department on non-work related emails suggesting a close relationship with them.

The filter for this diagram is non-work related after office hours emails.

Isia Vann has non-work related email exchanges with Rachel Pantanal (Assistant to CIO, Tethys Citizenship), Claudio Hawelon (Truck Driver, Kronos Citizenship), Mat Bramar (Assistant to CEO, Tethys Citizenship) and Inga Ferro (Site Control, Kronos Citizenship). Although Claudio did not receive the “ARISE” email, police investigators might still want to try to interview him since he seems to have a relatively closer relationship with Isia than other colleagues in the company especially since there were emails exchanges on non-work related subject after office work hours.

The filter for this diagram is non-work related after office hours emails.

Clusters of relationships by department

The image below shows the email exchanges for non-work-related emails outside office hours with at least 2 exchanges in the 14-day period. There are four clusters with such exchanges. One cluster is the facilities department among their own department members. Another cluster is the IT technicians without the IT manager. For the executives cluster, the cluster is among the CEO, CIO and Environmental Safety Advisor.

The fourth cluster is the administration cluster connected with some members of the security department through Rachel and Isia. This again highlights an anomaly as colleagues usually form closer informal relationships within their department and less often outside their department. The relationship between Rachel and Isia is worthy to be further investigated.

It is also interesting to note that the news articles mentioned that Edvard Vann, the guard of safety at GAStech was questioned for hours on suspicion of his involvement with the kidnapping due to the similarity between his name and that of a Protector of Kronos member. However,in the above diagram, it is revealed that Edvard is not only not connected to the suspected group of people with the suspicious “Arise” email, he also does not correspond with any of the other colleagues outside office hours, suggesting that he does not have a strong personal relationship with his colleagues. Unless he uses other modes of communication, this does suggest that he may indeed not be related to the suspected “Protector of Kronos” member.

Relationships with the government (military)

An honorable discharge occurs when a military service member received a good or excellent rating for their service time, by exceeding standards for performance and personal conduct. If a service member’s performance is satisfactory but the individual failed to meet all expectations of conduct for military members, the discharge is considered a General Discharge. To receive a General Discharge from the military, there has to be some form of nonjudicial punishment to correct unacceptable military behavior or failure to meet military standards1.

It is worthy to investigate why the majority of the suspected group was given a general discharge and if it was due to extreme ideologies/goals. With the exception of Ruscella as she was much older, all of the suspected members in the group (Hennie, Isia, Loreto, Inga, Minke) had overlaps in service with at least one other member in this group. This means that there were likely some unofficial relationships/contact with each other in the Army before they joined GAStech. The “General Discharge” instead of “Honorable Discharge” could also potentially be a reflection of their relationships with the Krono government (i.e. concerns that Krono government has flagged out regarding them).

As Rachel is a Tethys citizen, she did not serve in the Armed Forces of Kronos and will not be reflected in this diagram.

Relationships between APA, POK and GAStech

The visualization below maps the potential official and unofficial relationships that the GAStech employees potentially have with APA, POK.

It has come to our attention that four out of six members of the earlier mentioned suspected group have relationships with APA and POK through connected family members, assuming that the shared family name does indeed indicate that they are family and not a mere coincidence. Connected family members may share some common ideologies and affiliations and this point needs to be further investigated.

When we overlap the multiple visualizations above, they point towards the suspected group of GAStech employees as they do have a common goal through their multiple connections especially with POK and/or APA. This scenario likely has a the highest likelihood among all the scenarios.

However, there is also the other possibility that the four GAStech executives that were claimed to be missing/kidnapped were actually on a personal impromptu golf vacation to celebrate their windfall from the IPO. The diagram below shows the email exchange with “vacation” included inside in the week before the supposed “kidnapping”. They might have simply gone for a vacation to celebrate and they were not kidnapped at all.

It matches the development of the police correcting the number of missing GAStech employees from fourteen to ten. POK might have issued a ransom note to take advantage of the situation but may or may not be actually involved in the kidnapping despite their claims.

It is also possible that the suppliers dressed in black have kidnapped the other ten employees in order to obtain a ransom from Sten Sanjorge Jr, CEO of GAStech who is now a billionare after the company’s IPO. They may not have been fully aware of the executives’ vacation plans and hence did not succeed in kidnapping them but simply captured ten other employees during their planned time window with the chaos caused by the fire alarm. The affiliation of the suppliers is not clear but it is likely that they managed to get into the GAStech building with the help of an GAStech administrative executive who have contracted them to cater for the reunion breakfast between the Kronos government and GAStech executives.

The police investigators are advised to investigate the above potential suspicious scenarios and look into the abovementioned possible suspects.

References


  1. https://themilitarywallet.com/types-of-military-discharges/↩︎